Skip to content

fix: validate tensor dims_count against the shared-memory region#438

Open
LinZiyuu wants to merge 1 commit into
triton-inference-server:mainfrom
LinZiyuu:fix/validate-tensor-dims-count
Open

fix: validate tensor dims_count against the shared-memory region#438
LinZiyuu wants to merge 1 commit into
triton-inference-server:mainfrom
LinZiyuu:fix/validate-tensor-dims-count

Conversation

@LinZiyuu
Copy link
Copy Markdown

This validates the tensor dims_count read from shared memory before it is used, extending the shared-memory boundary validation added in #405 / #406.

Previously, PbTensor::LoadFromSharedMemory() took dims_count from the shared-memory region and used it to compute name_offset and to build the dims vector (std::vector<int64_t>(dims_ptr, dims_ptr + dims_count)) with no bounds check. A corrupted dims_count makes the parent process perform a large out-of-bounds read and crash the server, and the sizeof(int64_t) * dims_count product can overflow into a small, controlled name_offset. The #406 fix bounded MemoryShm::byte_size, but not this dimension count.

This change validates dims_count so the dims array stays within the region before it is used, throwing PythonBackendException otherwise. The check uses division to avoid overflowing the product, and mirrors the MemoryShm::byte_size boundary check. Valid tensors are unaffected.

Reproduced on nvcr.io/nvidia/tritonserver:26.04-py3 (CPU): a model that overwrites a live output tensor's dims_count in the backend shared memory crashes the whole server (Exited (139), SIGSEGV) within seconds, via python_be.ccInferResponse::LoadFromSharedMemoryPbTensor::LoadFromSharedMemory. With this change the corrupted tensor is rejected with an error instead of faulting.

The sibling unbounded values read from shared memory have the same pattern and are worth a follow-up: InferResponse::outputs_size, InferRequest::requested_output_count/input_count, PbMap::length, MessageQueue::Pop's tail, and the object handles passed to SharedMemoryManager::Load<T>.

PbTensor::LoadFromSharedMemory() reads `dims_count` from shared memory and
uses it to compute `name_offset` and to construct the dims vector
(`std::vector<int64_t>(dims_ptr, dims_ptr + dims_count)`) without checking it
against the region. A corrupted `dims_count` (e.g. written by a model into the
backend shm) makes the parent process perform a large out-of-bounds read,
crashing the server; the `sizeof(int64_t) * dims_count` product can also
overflow and yield a small, controlled `name_offset`.

Validate `dims_count` so that the dims array stays within the shared-memory
region before it is used, throwing PythonBackendException otherwise. The check
uses division to avoid overflowing the product, and mirrors the
MemoryShm::byte_size boundary check. Valid tensors are unaffected.

Signed-off-by: LinZiyuu <linziyu0205@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant